Network Management and control using collaborative on-line simulation

ABSTRACT

A collaborative on-line simulation system and method to provide automated and pro-active control functions for computer network. In a wide area network, clients communicate through one or several nodes ( 108 ). Each node ( 108 ) contains routers which include control plane ( 202 ) and data plane ( 204 ). Collaborative on-line simulators ( 206 ) are interfaced to the network nodes ( 108 ) and continuously monitor the surrounding network conditions, communicate with other simulators and execute collaborative on-line simulation. Based on the simulation results, the on-line simulators ( 206 ) continuously tune selected network parameters to a more efficient operation point to fit the current network conditions.

BACKGROUND OF THE INVENTION

I. Field of the Invention

The present invention generally relates to computer network data management and control. In particular, the present invention relates to providing a system and method to improve computer network control by providing real-time tuning of the network for better performance.

II. Description of the Related Art

As the Internet and other available global network data transfer mechanisms become increasingly in demand, network traffic over these data networks has become problematic. The number of data packet losses, requiring packet re-transmission, as well as the failure of network components has caused networks to experience reduced data transfer rates and, in many cases, network failure due to inefficient network management. Network management involves the collection of data from the network using protocols like SNMP. There are few tools that innovatively interpret this data to predict network faults.

Conventional network simulators are used for network design, and in some cases network planning, in order to design more efficient networks to handle today's increasing demands. These conventional simulators are not used for on-line network control, but rather run in an experimental setting using a representative sample of the network data or a model of the network structure to develop better protocols and mechanisms to transfer data. In additional, conventional simulator are not efficient.

These conventional simulators are now becoming less efficient because today's networks data loads and operating conditions vary greatly over time. In order to maintain a more efficient network, there is a need for a mechanism to configure computer networks by using live data where changes in the configuration can be implemented in real-time.

SUMMARY OF THE INVENTION

The present invention provides a collaborative on-line simulation system and method to provide automated and pro-active control functions for computer networks. The system and method introduce autonomous on-line simulators into local networks. These autonomous on-line simulators continuously monitor the surrounding network conditions, collect relevant network parameter information, communicate with other simulators and execute collaborative on-line simulation. Based on the simulation results, the on-line simulators then continuously tune selected network parameters to an efficient operation point to fit the current network conditions.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other advantages and features of the invention will become more apparent from the detailed description of exemplary embodiments provided below with reference to the accompanying drawings in which:

FIG. 1 illustrates a model of network nodes of a wide area computer network;

FIG. 2 illustrates a network node of a local area network of FIG. 1 interfaced to the collaborative on-line simulation system of the present invention;

FIG. 3 illustrates the structure of the collaborative on-line simulation system of FIG. 2;

FIG. 4(a) illustrates a flowchart of the hybrid parameter searching of the present invention;

FIG. 4(b) illustrates the Farm-Worker structure of the collaborative on-line simulation system of the present invention; and

FIG. 5 illustrates a processor-based system which incorporates the collaborative on-line simulation system and method of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

Referring now to the drawings, where like reference numerals designate like elements, there is shown in FIG. 1 a wide area network 100 for use in various applications, i.e. communications, Internet web page hosting, etc., including local networks 102, 104 and 106. Each of the local networks 102, 104, 106 include at least one server containing processors, databases, mainframes and other equipment used to distribute data to multiple clients, the users, connected by network nodes 108 through inter-connections 110 and 112. The clients exchange data amongst clients within the same local area network 102, 104 and 106 as well as with those in other local networks 102, 104 and 106 through one or several nodes 108. The nodes 108 are inter-connected to conventional wide area network backbone hardware/software. For example the wide area network can be the Internet.

As shown in FIG. 2, each network node 108 contains routers which include a control plane 202 and data plane 204. The control plane 202 and data plane 204 are essentially two separate communication paths used to pass control data, where the data is traveling (protocol information), and the data itself respectively. Data is transmitted on the data plane 204 and control signals for network parameters (e.g. protocol parameters) are transmitted on the control plane 202. The control plane 202 and data plane 204 are connected amongst the several network nodes 108 through inter-connections 110 and 112 to form local networks 102, 104, 106 and ultimately a wide area network 100.

Collaborative on-line simulators 206 are interfaced to the network nodes 108 within the local networks 102, 104 and 106. The on-line simulators 206 continuously monitor the surrounding network conditions, collect the relevant information, e.g. on-line protocol parameters through on-line traffic, and exchange information with other simulators by sending information, including advised parameter setting along line 212 through control plane 202. Based on the information received, simulations are executed by the on-line simulators 206 and parameter search methods are used to evaluate the results of the simulations and search for better network parameters.

The use of simulators as well as the use of various types of simulations by the simulators are well known in the art. However, conventionally the results of these simulations are not used to change network parameters in real-time because the results would be unreliable due the changing conditions of the wide area network 100 and the large number of network node 108 experiencing different conditions. The present invention uses conventional simulations but enables the each on-line simulators 206 to use input from the output (results) of another on-line simulator 206 to perform the simulations. In this way the results of each individual on-line simulator 206 are more reliable because each result based upon current and future network conditions. Thus, the present invention allows the network parameters to continuously change in real-time and have a net overall improvement on the wide area network 100. Thus, a dynamic and automatic network control can be achieved. Note that the above on-line simulators 206 interact with the control plane 202. Therefore, the on-line simulators 206 actually accomplish a second-order control over the wide area network 100. In other words, the on-line simulation merely prescribes parameters required for the operation of network protocols and does not interfere with their normal operation in any other way.

FIG. 3 is a block diagram illustrating the architecture of the collaborative on-line simulators 206. Each on-line simulator 206 includes a monitor and modeling unit 302, experiment design unit 306, management interface unit 304 and experiment execution unit 308. This above units may be implemented in software and executed at network nodes 108.

The monitor and on-line modeling unit 302 continually collects information about the local network (e.g., network topology, traffic conditions, etc.) and tries to build the most updated network model to represent current network conditions for use during simulation. The management interface unit 304 is the control center of the on-line simulator 206. The management interface unit 304 controls and synchronizes the operation of all the other units within the on-line simulator 206 while serving as an interface of the on-line simulator 206 with the network nodes 108 through which it is connected along lines 210 and 212 (FIG. 2). The experiment design unit 306 is responsible for setting up simulation experiments with appropriate search techniques (explained below), and analyzing the results of the simulation experiments to perform further searches to find more efficient network parameter settings, if necessary. The experiment execution unit 308 executes the simulations received from the experiment design unit 306 and returns the results to the experiment design unit 306.

Besides interacting with the local network 102, 104, 106, each on-line simulator 206 also communicates with other simulators and exchanges the relevant network parameter information, such as network traffic models and efficient network parameters. Thus, a collaborative, scalable on-line simulation network is formed. The network is scalable in that with the addition of each additional network node 108 additional collaborative on-line simulators 206 may be added which will work in conjunction with previously existing simulators to change network parameters in real-time. Through this, each of the local on-line simulators 206 acquires a global view of the network and thus is able to perform better network simulation and control.

Since the network conditions keep changing all the time, the on-line simulation system and method also requires a fast experiment design method to quickly finish the simulation experiments and find efficient network parameter settings before the underlying network information becomes stale. The goal is to use as few experiments as possible to find as efficient a parameter setting as possible. Note that the emphasis is not on seeking the optimum setting. Instead, a best-effort strategy is adopted to find a better operating point within a limited time frame. Thus, the search and simulations can be interrupted at any time and still produce a result better than the starting point. This provides the possibility to make a compromise between the quality of the result and the search time to obtain the result. In a preferred embodiment, the Random Early Drop (RED) queuing management algorithm is used as the underlying network algorithm to be adjusted because of its sensitivity to parameter settings.

To accomplish a speedy result, the present invention implements a two-part hybrid search method in the concerned parameter space as shown in flowchart 4(a).

First a high level pruning step occurs. The search space is probed to determine the important parameters which will have the most effect on network performance (step 470). After pruning part of the search space by ignoring less important parameters, those remaining parameters are searched in more detail (step 472).

In a preferred embodiment, the high level pruning occurs as follows. The search space is probed by the on-line simulators 206 conducting simulations in portions of the parameter space, specifically the boundaries of the space. These simulations will be based upon a 2^(k) full factorial experiment design. 2^(k) full factorial design is known in the art of performance analysis.

2^(k) full factorial design examples all possible combinations of the parameter boundaries and fits the parameter boundary results into a non-linear regression model. The model analyzes the importance of different parameters. The above method is not a iterative method. Instead, to achieve an increasingly refined result, simulations are ordered to form a series of subsets and the first subset is generated by applying 2^(k−p) fractional factorial design on the parameter space. “P” is the minimum integer satisfying 2^(k−p)≧k, which is required by the regression analysis. 2^(k−p) fractional factorial design is a technique which just executes part of the experiments in 2^(k) fill factorial design.

By carefully selecting the simulations, the analysis of the parameter importance can be executed with improved speed with only minimum expense to accuracy. After finishing a subset of simulations and analyzing the simulation results, the next larger subset, which is obtained by using 2^(k−p+1) fractional factorial design, is analyzed, and so on until all 2^(k) simulations are finished. During this process, if the search is interrupted, the analysis result based on the last subset of simulations is returned as the “best-so-far” result. Thus, the network still has been tuned for better efficiency.

Second, once the high level pruning is complete, the next task is to search the remaining parameter space in detail with state space search techniques. Basically, the state space search method includes two important components: exploration and exploitation, and a balance strategy between them (steps 474 and 476). Exploration encourages the search process to examine unknown regions. Exploitation attempts to converge to a maximum or minimum in the vicinity of a chosen region.

Thus, the hybrid on-line simulation methods and system that implements the method use a best-effort strategy in its second-order control, whose emphasis is not on full optimization, but on continuously and increasingly moving the system towards a better operating point. The present invention continuously tunes up the underlying operation (albeit at a larger time-scale than their normal operation) and therefore, equips the network management infrastructure with “pro-active” management capabilities.

In another preferred embodiment, simulation execution is sped-up using parallel execution of the simulations. FIG. 4(b) illustrates a parallel processing architecture using a farmer-worker infrastructure. The farmer-worker infrastructure of FIG. 4(b) allows for distribution of many single-machine simulations across multiple workstations. The dispatcher 402 is the interface between this distributed simulation executer (the “worker”) 406, 408, 410, and the experiment design unit 306. All the simulations have to go through this dispatcher 402 which acts as an interface distributing the simulation to be distributed among the workers 406, 408, 410. The farmer 404 is the center of this infrastructure, which routes the operations of dispatcher 402 and workers 406, 408, 410. The farmer 404 may use conventional distributed network architecture queuing schemes to distribute and route simulations amongst the workers 406, 408, 410, where the workers 406, 408, 410 are the actual simulation executers. The above farmer-worker infrastructure can use multiple workers 406, 408, 410 for the same experiment design unit 306 to speed up the simulation process. In a preferred embodiment, all the communication in this scheme is through TCP connections. Therefore, the dispatchers 402, farmer 404, and workers 406, 408, 410, can be located anywhere in the network. Thus, experiments can be evenly distributed over the whole wide area network and maximize the utilization of the computing resources.

Referring now to FIG. 6, each network node 102, 104 and 106 may contain a processor-based system 500 for implementing the above described system and method. The processor-based system 500 includes a central processing unit (CPU) 502, for example, a microprocessor, that communicates with one or more input/output (I/O) devices 508, 510 over a bus 516 is shown. The processor-based system 500 also includes random access memory (RAM) 512, a read only memory (ROM) 514 and may include peripheral devices such as a disk drive 504 and CD-ROM drive 506 which also communicates with CPU 502 over the bus 516. Memory 512 can be configured to store the collaborative on-line simulation system and method for the present invention as described above. It may also be desirable to integrate the processor 502 and memory 512 on a single integrated chip.

Hence, the present invention provides a system and method for improving the efficiency of a computer network by the use a on-line simulators which execute at least one simulation based upon current network conditions and tune network parameters in real time.

Although the invention has been described above in connection with exemplary embodiments, it is apparent that many modifications and substitutions can be made without departing from the spirit or scope of the invention. In particular, although the invention is described with reference to tuning network protocol parameters, the system and method can also be applied to other aspects of computer network such as routing. Likewise, the system can be implemented on a UNIX, LINUX or any other operating system. Accordingly, the invention is not to be considered as limited by the foregoing description, but is only limited by the scope of the appended claims. 

1. A system for improving the efficiency of a data network, said system comprising: a plurality of simulators, said plurality of simulators receiving current network parameters and changing said network parameters for more efficient performance; and plurality of network nodes providing said current network parameter to each of said plurality of simulators.
 2. The system of claim 1 further comprising a monitor and on-line modeling unit, said monitor and on-line modeling unit continually collecting at least one network parameter.
 3. The system of claim 1 further comprising a management interface unit, said management interface unit routing the operation of components within said plurality of simulators.
 4. The system of claim 1 further comprising an experimental design unit, said experimental design unit determining and/or setting up simulation experiments and/or analyzing a result of said simulation experiments to set said network parameters.
 5. The system of claim 4 further comprising an experiment execution unit, said experiment execution executing the simulation experiments received from said experiment design unit.
 6. The system of claim 1, wherein each of said plurality of network nodes comprises a control plane coupled to said at least one simulators.
 7. A processor-based system comprising: a processor; and a memory device coupled to said processor, said memory device containing a device for continuously changing network parameters to improving the efficiency of a data network, said device comprising: a plurality of simulators, said plurality of simulators receiving current network parameters and changing said network parameters for more efficient performance; and plurality of network nodes providing said current network parameter to each of said plurality of simulators.
 8. The system of claim 7 further comprising a monitor and on-line modeling unit, said monitor and on-line modeling unit continually collecting at least one network parameter.
 9. The system of claim 7 further comprising a management interface unit, said management interface unit routing the operation of components within said plurality of simulators.
 10. The system of claim 7 further comprising an experimental design unit, said experimental design unit determining and/or setting up simulation experiments and/or analyzing a result of said simulation experiments to set said network parameters.
 11. The system of claim 10 further comprising an experiment execution unit, said experiment execution executing the simulation experiments received from said experiment design unit.
 12. The system of claim 7, wherein each of said plurality of network nodes comprises a control plane coupled to said at least one simulators.
 13. An integrated memory circuit comprising: a die containing a processor and memory device, said memory device containing a device for improving the efficiency of a data network, said device comprising: a plurality of simulators, said plurality of simulators receiving current network parameters and changing said network parameters for more efficient performance; and plurality of network nodes providing said current network parameter to each of said plurality of simulators.
 14. The circuit of claim 13 further comprising a monitor and on-line modeling unit, said monitor and on-line modeling unit continually collecting at least one network parameter.
 15. The circuit of claim 13 further comprising a management interface unit, said management interface unit routing the operation of components within said plurality of simulators.
 16. The circuit of claim 13 an experimental design unit, said experimental design unit determining an/or setting up simulation experiments and/or analyzing a result of said simulation experiments to set said network parameters.
 17. The circuit of claim 16 further comprising an experiment execution unit, said experiment execution executing the simulation experiments received from said experiment design unit.
 18. The circuit of claim 13, wherein each of said plurality of network nodes comprises a control plane coupled to said at least one simulators.
 19. A method for improving the efficiency of a data network, said method comprising: searching a plurality of network node for current network parameters; conducting a plurality of simulation experiments with said current network parameters; providing improved network parameters to said plurality of network nodes in response to said simulation experiments; and and changing said current network parameters to said improved network parameters in real-time.
 20. The method of claim 19 further comprising providing said improved network parameters to other simulators.
 21. The method of claim 19, wherein said search of said current network parameters is performed using a ^(2k) full factorial methodology.
 22. The method of claim 19, wherein said search of said current network parameters is performed using exploration and exploitation.
 23. The method of claim 19, wherein multiple simulation experiments are processed in parallel.
 24. The method of claim 23, wherein said parallel processing of said multiple simulation experiments are conducted over a plurality of simulation execution devices.
 25. The method of claim 19, wherein Random Early Drop queuing management is used as the underlying network scheme.
 26. The method of claim 19, wherein said simulation experiments are performed on a UNIX based computer system.
 27. The method of claim 19, wherein said simulation experiments are performed on a LINUX based computer system. 